25 research outputs found

    Snapshot Semantics for Temporal Multiset Relations (Extended Version)

    Full text link
    Snapshot semantics is widely used for evaluating queries over temporal data: temporal relations are seen as sequences of snapshot relations, and queries are evaluated at each snapshot. In this work, we demonstrate that current approaches for snapshot semantics over interval-timestamped multiset relations are subject to two bugs regarding snapshot aggregation and bag difference. We introduce a novel temporal data model based on K-relations that overcomes these bugs and prove it to correctly encode snapshot semantics. Furthermore, we present an efficient implementation of our model as a database middleware and demonstrate experimentally that our approach is competitive with native implementations and significantly outperforms such implementations on queries that involve aggregation.Comment: extended version of PVLDB pape

    Cache-efficient sweeping-based interval joins for extended Allen relation predicates

    Full text link
    We develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing

    Database Technology for Processing Temporal Data

    Get PDF

    Leveraging range joins for the computation of overlap joins

    Full text link
    Joins are essential and potentially expensive operations in database management systems. When data is associated with time periods, joins commonly include predicates that require pairs of argument tuples to overlap in order to qualify for the result. Our goal is to enable built-in systems support for such joins. In particular, we present an approach where overlap joins are formulated as unions of range joins, which are more general purpose joins compared to overlap joins, i.e., are useful in their own right, and are supported well by B+-trees. The approach is sufficiently flexible that it also supports joins with additional equality predicates, as well as open, closed, and half-open time periods over discrete and continuous domains, thus offering both generality and simplicity, which is important in a system setting. We provide both a stand-alone solution that performs on par with the state-of-the-art and a DBMS embedded solution that is able to exploit standard indexing and clearly outperforms existing DBMS solutions that depend on specialized indexing techniques. We offer both analytical and empirical evaluations of the proposals. The empirical study includes comparisons with pertinent existing proposals and offers detailed insight into the performance characteristics of the proposals

    Leveraging range joins for the computation of overlap joins

    Get PDF
    Joins are essential and potentially expensive operations in database management systems. When data is associated with time periods, joins commonly include predicates that require pairs of argument tuples to overlap in order to qualify for the result. Our goal is to enable built-in systems support for such joins. In particular, we present an approach where overlap joins are formulated as unions of range joins, which are more general purpose joins compared to overlap joins, i.e., are useful in their own right, and are supported well by B+-trees. The approach is sufficiently flexible that it also supports joins with additional equality predicates, as well as open, closed, and half-open time periods over discrete and continuous domains, thus offering both generality and simplicity, which is important in a system setting. We provide both a stand-alone solution that performs on par with the state-of-the-art and a DBMS embedded solution that is able to exploit standard indexing and clearly outperforms existing DBMS solutions that depend on specialized indexing techniques. We offer both analytical and empirical evaluations of the proposals. The empirical study includes comparisons with pertinent existing proposals and offers detailed insight into the performance characteristics of the proposals

    Interval-Dependent Attributes in Relational Database Systems

    Full text link
    Data with time intervals is prominently present in finance, accounting, medicine and many other application domains. When querying such data, it is important to perform operations on aligned intervals, i.e., data is processed together only for the common interval where it is valid in the real world. For instance, an employee contributed to a project only for the time period where both the project was running and the employee was employed by the company, i.e., the employee contributed to the project only over their aligned time interval. A temporal join is thus only evaluated over the aligned interval of an employee and a project. The problem of performing temporal operations, such as temporal aggregation or temporal joins, on data with time intervals using relational database systems can be attributed to the lack of primitives for the alignment of intervals. Even more challenges arise, when the data includes attribute values that are interval-dependent, such as project budgets or cumulative costs, and need to be scaled along with the alignment of intervals during processing. The goal of this thesis is to provide systematic and built-in support for querying data with intervals in relational database systems. The solution we propose uses two temporal primitives a temporal normalizer and a temporal aligner for the alignment of intervals. Temporal operators on interval data are defined by reduction rules that map a temporal operator to an operation with a temporal primitive followed by the corresponding traditional non-temporal operator that uses equality on aligned intervals. A key feature of our approach is that operators can access the original time intervals in predicates and functions, such as join conditions and aggregation functions, using timestamp propagation. Our approach, through timestamp propagation, supports the scaling of attribute values that are interval-dependent. When intervals are aligned during query processing, scaling can be performed at query time with the help of user-defined functions. This allows users to choose whether and how attribute values should be scaled. This is necessary since they may be interested in the total value in one query and the scaled value according to days or even working days in another query. We integrated our solution into the kernel of the open source database system PostgreSQL, which allows to leverage existing query optimization techniques and algorithms

    Snapshot Semantics for Temporal Multiset Relations

    Get PDF
    Snapshot semantics is widely used for evaluating queries over temporal data: temporal relations are seen as sequences of snapshot relations, and queries are evaluated at each snapshot. In this work, we demonstrate that current approaches for snapshot semantics over interval-timestamped multiset relations are subject to two bugs regarding snapshot aggregation and bag difference. We introduce a novel temporal data model based on K-relations that overcomes these bugs and prove it to correctly encode snapshot semantics. Furthermore, we present an efficient implementation of our model as a database middleware and demonstrate experimentally that our approach is competitive with native implementations

    Query time scaling of attribute values in interval timestamped databases

    Full text link
    In valid-time databases with interval timestamping each tuple is associated with a time interval over which the recorded fact is true in the modeled reality. The adjustment of these intervals is an essential part of processing interval timestamped data. Some attribute values remain valid if the associated interval changes, whereas others have to be scaled along with the time interval. For example, attributes that record total (cumulative) quantities over time, such as project budgets, total sales or total costs, often must be scaled if the timestamp is adjusted. The goal of this demo is to show how to support the scaling of attribute values in SQL at query time

    Continuous Imputation of Missing Values in Streams of Pattern-Determining Time Series

    Full text link
    Time series data is ubiquitous but often incomplete, e.g., due to sensor failures and transmission errors. Since many applications require complete data, missing values must be imputed before further data processing is possible. We propose Top-k Case Matching (TKCM) to impute missing values in streams of time series data. TKCM defines for each time series a set of reference time series and exploits similar historical situations in the reference time series for the imputation. A situation is characterized by the anchor point of a pattern that consists of l consecutive measurements over the reference time series. A missing value in a time series s is derived from the values of s at the anchor points of the k most similar patterns. We show that TKCM imputes missing values consistently if the reference time series pattern-determine time series s, i.e., the pattern of length l at time tn is repeated at least k times in the reference time series and the corresponding values of s at the anchor time points are similar to each other. In contrast to previous work, we support time series that are not linearly correlated but, e.g., phase shifted. TKCM is resilient to consecutively missing values, and the accuracy of the imputed values does not decrease if blocks of values are missing. The results of an exhaustive experimental evaluation using real-world and synthetic data shows that we outperform the state-of-the-art solutions

    Modeling and querying facts with period timestamps in data warehouses

    No full text
    In this paper, we study various ways of representing and querying fact data that are time-stamped with a time period in a data warehouse. The main focus is on how to represent the time periods that are associated with the facts in order to support convenient and efficient aggregations over time. We propose three distinct logical models that represent time periods as sets of all time points in a period (instant model), as pairs of start and end time points of a period (period model), and as atomic units that are explicitly stored in a new period dimension (period∗ model). The period dimension is enriched with information about the days of each period, thereby combining the former two models. We use four different classes of aggregation queries to analyze query formulation, query execution, and query performance over the three models. An extensive empirical evaluation on synthetic and real-world datasets and the analysis of the query execution plans reveal that the period model is the best choice in terms of runtime and space for all four query classes
    corecore